Back

International Journal of Medical Informatics

Elsevier BV

All preprints, ranked by how well they match International Journal of Medical Informatics's content profile, based on 25 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Development and Evaluation of Machine Learning Models for the Detection of Emergency Department Patients with Opioid Misuse from Clinical Notes

Shahid, U.; Parde, N.; Smith, D. L.; Dickinson, G.; Bianco, J.; Thorpe, D.; Hota, M.; Afshar, M.; Karnik, N. S.; chhabra, n.

2024-12-12 emergency medicine 10.1101/2024.12.11.24318875 medRxiv
Top 0.1%
44.3%
Show abstract

ObjectivesThe accurate identification of Emergency Department (ED) encounters involving opioid misuse is critical for health services, research, and surveillance. We sought to develop natural language processing (NLP)-based models for the detection of ED encounters involving opioid misuse. MethodsA sample of ED encounters enriched for opioid misuse was manually annotated and clinical notes extracted. We evaluated classic machine learning (ML) methods, fine-tuning of publicly available pretrained language models, and a previously developed convolutional neural network opioid classifier for use on hospitalized patients (SMART-AI). Performance was compared to ICD-10-CM codes. Both raw text and text transformed to the United Medical Language System were evaluated. Face validity was evaluated by term feature importance. ResultsThere were 1123 encounters used for training, validation, and testing. Of the classic ML methods, XGBoost had the highest AU_PRC (0.936), accuracy (0.887), and F1 score (0.863) which outperformed ICD-10-CM codes [accuracy 0.870; F1 0.830]. Logistic regression, support vector machine, and XGBoost models had higher AU_PRC using transformed text, while decision trees performed better using raw text. Excluding XGBoost, fine-tuned pre-trained language models outperformed classic ML methods. The best performing model was the fine-tuned SMART-AI based model with domain adaptation [AU_PRC 0.948; accuracy 0.882; F1 0.851]. Explainability analyses showed the most predictive terms were heroin, opioids, alcoholic intoxication, chronic, cocaine, opiates, and suboxone. ConclusionsNLP-based models outperform entry of ICD-10-CM diagnosis codes for the detection of ED encounters with opioid misuse. Fine tuning with domain adaptation for pre-trained language models resulted in improved performance.

2
Augmenting maternal clinical cohort data with administrative laboratory dataset linkages: a validation study

Rossouw, L.; Ngcobo, N.; Clouse, K.; Nattey, C.; Technau, K.-G.; Maskew, M.

2024-06-20 hiv aids 10.1101/2024.06.19.24309149 medRxiv
Top 0.1%
40.6%
Show abstract

BackgroundThe use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information. MethodsWe explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This offers us with a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach. ResultsThe linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data. ConclusionsThe high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective tool for deterministic linking in health data, and may provide a method of creating large, linked data sets without compromising patient confidentiality.

3
Labelling chest x-ray reports using an open-source NLP and ML tool for text data binary classification

Towfighi, S.; Agarwal, A.; Mak, D. Y.; Verma, A.

2019-11-22 health informatics 10.1101/19012518 medRxiv
Top 0.1%
32.7%
Show abstract

The chest x-ray is a commonly requested diagnostic test on internal medicine wards which can diagnose many acute pathologies needing intervention. We developed a natural language processing (NLP) and machine learning (ML) model to identify the presence of opacities or endotracheal intubation on chest x-rays using only the radiology report. This a preliminary report of our work and findings. Using the General Medicine Inpatient Initiative (GEMINI) dataset, housing inpatient clinical and administrative data from 7 major hospitals, we retrieved 1000 plain film radiology reports which were classified according to 4 labels by an internal medicine resident. NLP/ML models were developed to identify the following on the radiograph reports: the report is that of a chest x-ray, there is definite absence of an opacity, there is definite presence of an opacity, the report is a follow-up report with minimal details in its text, and there is an endotracheal tube in place. Our NLP/ML model development methodology included a random search of either TF-IDF or bag-of-words for vectorization along with random search of various ML models. Our Python programming scripts were made publicly available on GitHub to allow other parties to train models using their own text data. 100 randomly generated ML pipelines were compared using 10-fold cross validation on 75% of the data, while 25% of the data was left out for generalizability testing. With respect to the question of whether a chest x-ray definitely lacks an opacity, the models performance metrics were accuracy of 0.84, precision of 0.94, recall of 0.81, and receiver operating characteristic area under curve of 0.86. Model performance was worse when trained against a highly imbalanced dataset despite the use of an advanced oversampling technique.

4
Detecting Problematic Opioid Use in the Electronic Health Record: Automation of the Addiction Behaviors Checklist in a Chronic Pain Population

Chatham, A. H.; Bradley, E. D.; Schirle, L.; Sanchez-Roige, S.; Samuels, D. C.; Jeffery, A. D.

2023-06-12 health informatics 10.1101/2023.06.08.23290894 medRxiv
Top 0.1%
28.4%
Show abstract

ImportanceIndividuals whose chronic pain is managed with opioids are at high risk of developing an opioid use disorder. Large data sets, such as electronic health records, are required for conducting studies that assist with identification and management of problematic opioid use. ObjectiveDetermine whether regular expressions, a highly interpretable natural language processing technique, could automate a validated clinical tool (Addiction Behaviors Checklist1) to expedite the identification of problematic opioid use in the electronic health record. DesignThis cross-sectional study reports on a retrospective cohort with data analyzed from 2021 through 2023. The approach was evaluated against a blinded, manually reviewed holdout test set of 100 patients. SettingThe study used data from Vanderbilt University Medical Centers Synthetic Derivative, a de-identified version of the electronic health record for research purposes. ParticipantsThis cohort comprised 8,063 individuals with chronic pain. Chronic pain was defined by International Classification of Disease codes occurring on at least two different days.18 We collected demographic, billing code, and free-text notes from patients electronic health records. Main Outcomes and MeasuresThe primary outcome was the evaluation of the automated method in identifying patients demonstrating problematic opioid use and its comparison to opioid use disorder diagnostic codes. We evaluated the methods with F1 scores and areas under the curve - indicators of sensitivity, specificity, and positive and negative predictive value. ResultsThe cohort comprised 8,063 individuals with chronic pain (mean [SD] age at earliest chronic pain diagnosis, 56.2 [16.3] years; 5081 [63.0%] females; 2982 [37.0%] male patients; 76 [1.0%] Asian, 1336 [16.6%] Black, 56 [1.0%] other, 30 [0.4%] unknown race patients, and 6499 [80.6%] White; 135 [1.7%] Hispanic/Latino, 7898 [98.0%] Non-Hispanic/Latino, and 30 [0.4%] unknown ethnicity patients). The automated approach identified individuals with problematic opioid use that were missed by diagnostic codes and outperformed diagnostic codes in F1 scores (0.74 vs. 0.08) and areas under the curve (0.82 vs 0.52). Conclusions and RelevanceThis automated data extraction technique can facilitate earlier identification of people at-risk for, and suffering from, problematic opioid use, and create new opportunities for studying long-term sequelae of opioid pain management. Key PointsO_ST_ABSQuestionC_ST_ABSCan an interpretable natural language processing method automate a valid, reliable clinical tool in order to expedite the identification of problematic opioid use in the electronic health record? FindingsIn this cross-sectional study of patients with chronic pain, an automated natural language processing approach identified individuals with problematic opioid use that were missed by diagnostic codes. MeaningRegular expressions can be used in automatically identifying problematic opioid use in an interpretable and generalizable manner.

5
Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses

Dolatabadi, E.; Chen, B.; Buchan, S. A.; Austin, A. M.; Azimaee, M.; McGeer, A.; Mubareka, S.; Kwong, J. C.

2022-11-29 health informatics 10.1101/2022.11.28.22282767 medRxiv
Top 0.1%
25.7%
Show abstract

BackgroundWith the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in Natural Language Processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports. ObjectiveIn this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning-based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system. MethodsThe NLP model, a hierarchical multi-label classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus included 85k unique laboratory reports annotated by eight Subject Matter Experts (SME). The models performance stability and variation were analyzed across fine-grained and coarse-grained classes. Moreover, the models generalizability was also evaluated internally and externally on various test sets. ResultsThe NLP model was trained several times with random initialization on the development corpus, and the results of the top ten best-performing models are presented in this paper. Overall, the NLP model performed well on internal, out-of-time (pre-COVID-19), and external (different laboratories) test sets with micro-averaged F1 scores >94% across all classes. Higher Precision and Recall scores with less variability were observed for the internal and pre-COVID-19 test sets. As expected, the models performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the models performance (lowest F1-score of 57%) was noticeably lower in the "detected" cases. ConclusionsWe demonstrated that deep learning-based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.

6
Leveraging Artificial Intelligence and Data Science for Integration of Social Determinants of Health in Emergency Medicine: A Scoping Review

Abbott, E.; Apakama, D.; Richardson, L.; Chan, L.; Carr, B.; Nadkarni, G.

2023-10-18 emergency medicine 10.1101/2023.10.17.23297158 medRxiv
Top 0.1%
23.1%
Show abstract

ObjectiveSocial Determinants of Health (SDOH) are critical drivers of health disparities and patient outcomes. However, accessing and collecting patient level SDOH data can be operationally challenging in the emergency department clinical setting requiring innovative approaches. This scoping review examines the potential of artificial intelligence (AI) and data science for modeling, extraction, and incorporation of SDOH data specifically within emergency departments (ED), further identifying areas for advancement and investigation. MethodsWe conducted a standardized search across Medline (Ovid), Embase (Ovid), CINAHL, Web of Science, and ERIC databases for studies published between 2015-2022. We focused on identifying studies employing AI or data science related to SDOH within emergency care contexts or conditions. Two specialized reviewers in Emergency Medicine and clinical informatics independently assessed each article, resolving discrepancies through iterative reviews and discussion. We then extracted data covering study details, methodologies, patient demographics, care settings, and principal outcomes. ResultsOf the 1,047 studies screened, 26 met the inclusion criteria. Notably, 9 out of 26 studies were solely concentrated on ED patients. Conditions studied spanned broad Emergency Medicine complaints and conditions including sepsis, acute myocardial infarction, and asthma. The majority (n=16) explored multiple SDOH domains, with homelessness/housing insecurity and neighborhood/built environment predominating. Machine learning (ML) techniques were utilized in 23 of the studies, natural language processing (NLP) being the most common approach used (n=11). Rule-based (n=5), deep learning (n=2), and pattern matching (n=4) were the most common NLP techniques used. NLP models in the reviewed studies displayed significant predictive performance with outcomes, With F1-scores ranging between 0.40 - 0.75 and specificities nearing 95.9%. ConclusionAlthough in its infancy, the convergence of AI and data science techniques, especially ML and NLP, with SDOH in Emergency Medicine offers transformative possibilities for better usage and integration of social data into clinical care and research. With a significant focus on the ED and notable NLP model performance, there is an imperative to standardize SDOH data collection, refine algorithms for diverse patient groups, and champion interdisciplinary synergies. These efforts aim to harness SDOH data optimally, enhancing patient care and mitigating health disparities. Our research underscores the vital need for continued investigation in this domain.

7
Fine-tuned large language models for answering questions about full-text biomedical research studies

Tao, K.; Zhou, J.; Osman, Z. A.; Ahluwalia, V.; Sabati, C.; Shafer, R. W.

2024-10-30 hiv aids 10.1101/2024.10.28.24316263 medRxiv
Top 0.1%
23.1%
Show abstract

BackgroundFew studies have explored the degree to which fine-tuning a large-language model (LLM) can improve its ability to answer a specific set of questions about a research study. MethodsWe created an instruction set comprising 250 marked-down studies of HIV drug resistance, 16 questions per study, answers to each question, and explanations for each answer. The questions were broadly relevant to studies of pathogenic human viruses including whether a study reported viral genetic sequences and the demographics and antiviral treatments of the persons from whom sequences were obtained. We fine-tuned GPT-4o-mini (GPT-4o), Llama3.1-8B-Instruct (Llama3.1-8B), and Llama3.1-70B-Instruct (Llama3.1-70B) using a quantized low rank adapter (QLoRA). We assessed the accuracy, precision, and recall of each base and fine-tuned model in answering the same questions on a test set comprising 120 different studies. Paired t-tests and Wilcoxon signed-rank tests were used to compare base models to one another, fine-tuned models to their respective base model, and the fine-tuned models to one another. ResultsPrior to fine-tuning, GPT-4o displayed significantly greater performance than both Llama3.1-70B and Llama3.1-8B due to its greater precision compared with Llama3.1-70B and greater precision and recall compared with Llama3.1-8B; there was no difference in performance between Llama3.1-70B and Llama3.1-8B. After fine-tuning, both GPT-4o and Llama3.1-70B, but not Llama3.1-8B, displayed significantly improved performance compared with their base models. The improved performance of GPT-4o resulted from a mean 6% increased precision and 9% increased recall; the improved performance of Llama3.1-70B resulted from a 15% increased precision. After fine-tuning, Llama3.1-70B significantly outperformed Llama3.1-8B but did not perform as well as the fine-tuned GPT-4o model which displayed superior recall. ConclusionFine-tuning GPT-4o and Llama3.1-70B, but not the smaller Llama3.1-8B, led to marked improvement in answering specific questions about research papers. The process we describe will be useful to researchers studying other medical domains. AUTHOR SUMMARYAddressing key biomedical questions often requires systematically reviewing data from numerous studies--a process that demands time and expertise. Large language models (LLMs) have shown potential in screening papers and summarizing their content. However, few research groups have fine-tuned these models to enhance their performance in specialized biomedical domains. In this study, we fine-tuned three LLMs to answer questions about studies on the subject of HIV drug resistance including one proprietary LLM (GPT-4o-mini) and two open-source LLMs (Llama3.1-Instruct-70B and Llama 3.1-Instruct-8B). To fine-tune the models, we used an instruction set comprising 250 studies of HIV drug resistance and selected 16 questions covering whether studies included viral genetic sequences, patient demographics, and antiviral treatments. We then tested the models on 120 independent research studies. Our results showed that fine-tuning GPT-4o-mini and Llama3.1-Instruct-70B significantly improved their ability to answer domain-specific questions, while the smaller Llama3.1-Instruct-8B model was not improved. The process we described offers a roadmap for researchers in other fields and represents a step in our attempt towards developing an LLM capable of answering questions about research studies across a range of pathogenic human viruses.

8
Machine Learning Prediction of Pharmacogenetic Test Uptake Among Opioid-Prescribed Patients Using Electronic Health Records: A Retrospective Cohort Study

Yaseliani, M.; Hong, J.-W.; Bian, J.; Cavallari, L.; Duarte, J.; Nelson, D.; Lo-Ciganic, W.-H.; Nguyen, K. A.; Hasan, M. M.

2025-09-28 health informatics 10.1101/2025.09.26.25336591 medRxiv
Top 0.1%
23.0%
Show abstract

BackgroundOpioids are a widely prescribed class of medication for pain management. However, they have variable efficacy and adverse effects among patients, due to complex interplay between biological and clinical factors. Pharmacogenetic (PGx) testing can be utilized to match patients genetic profiles to individualize opioid therapy, improving pain relief and reducing the risk of adverse effects. Despite its potential, PGx uptake--utilization of PGx testing--remains low due to a range of barriers at the patient, health care provider, infrastructure, and financial levels. Since testing typically involves a shared decision between the provider and patient, predicting likelihood of patient undergoing PGx testing and understanding the factors influencing that decision can help optimize resource use and improve outcomes in pain management. ObjectiveTo develop machine learning (ML) models, identifying patients likelihood of PGx uptake based on their demographics, clinical variables, medication use, and social determinants of health (SDoH). MethodsWe utilized electronic health records (EHR) data from a single center healthcare system to identify patients prescribed opioids. We extracted patients demographics, clinical variables, medication use, and SDoH, and developed and validated ML models, including neural networks (NN), logistic regression (LR), random forests (RF), gradient boosting (XGB), naive bayes (NB), and support vector machines (SVM) for PGx uptake prediction based on procedure codes. We performed 5-fold cross validation (CV) and created an ensemble probability-based classifier using the best-performing ML models for PGx uptake prediction. Various performance metrics, uptake stratification analysis, and feature importance analysis were employed to evaluate the performance of the models. ResultsThe ensemble model using XGB and SVM-RBF classifiers had the highest C-statistics at 79.61%, followed by XGB (78.94%), and NN (78.05%). While XGB was the best-performing model, the ensemble model achieved a high accuracy (67.38%), recall (76.50%), specificity (67.25%), and negative predictive value (99.49%). The uptake stratification analysis using the ensemble model indicated that it can effectively distinguish across uptake probability deciles, where those in the higher strata are more likely to undergo PGx in real-world (6.59% in the highest decile compared to 0.12% in the lowest). Furthermore, SHAP value analysis using the XGB model indicated age, hypertension, and household income as the most influential factors for PGx uptake prediction. ConclusionsThe proposed ensemble model demonstrated a high performance in PGx uptake prediction among patients using opioids for pain. This model can be utilized as a decision support tool, assisting clinicians in identifying patients likelihood of PGx uptake and guiding appropriate decision-making.

9
Comparing Machine and Deep Learning Models for Pediatric Anxiety Classification using Structured EHRs and Area-based Measures of Health Data

Lee, E. W.; Choo, S.; Maguire, D.; Shivanna, A.; Santel, D.; Bhatnagar, S.; Goethert, I.; Patterson, K.; Gholap, J.; Hanson, H. A.; Chandrashekar, M.; Ammerman, R. T.; Pestian, J. P.; Glauser, T.; Brokamp, C.; Strawn, J. R.; Kapadia, A.; Agasthya, G.

2025-05-02 health informatics 10.1101/2025.05.01.25326789 medRxiv
Top 0.1%
22.9%
Show abstract

ObjectiveThis study investigates the performance of various machine learning (ML) and deep learning (DL) models to classify pediatric patients at risk of anxiety disorders using electronic health records (EHRs). By leveraging EHR data and including Area-based measures of health (ABMH) data, this approach aims to enable proactive care by monitoring potential anxiety onset comprehensively across various age groups. MethodsIn this study, we trained a series of ML and DL models to classify youth at risk of developing anxiety disorders. ML models (Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, XGBoost) and DL models (LSTM, GRU, RETAIN, Dipole) were trained using structured EHR data from 30-day periods before anxiety diagnoses. Two datasets per age group were used: one with structured EHR data only and another with incorporating both structured EHR and ABMH data. Model performance was assessed using accuracy, the AUROC, AUPRC, PPV, NPV, and F1 scores. ResultsThe ML models provided a solid performance baseline, with XGBoost showing strong baseline performance across age groups, with AUROC scores of 0.817 (structured EHR) and 0.816 (structured EHR + ABMH). Between DL models, RETAIN and Dipole performed the best. For example, RETAIN achieved AUROC scores of 0.851 (structured EHR) and 0.853 (structured + ABMH), while Dipole scored 0.853 and 0.857, respectively, for 8-year-olds. These results underscore the viability of both ML and DL models for the early detection of pediatric anxiety disorders. ConclusionThis study comprehensively investigated ML and DL models for diagnosing pediatric anxiety. We demonstrated that ML and DL models can effectively monitor probable anxiety onset within an EHR system and also with the ABMH data. We discovered that model performance varied with age, indicating the need for personalized model development per age group for effective clinical predictive analytics.

10
Machine learning models to detect opioid misuse in Emergency Department patients at triage.

Chhablani, C.; Shahid, U.; Parde, N.; Muslmani, S.; Hu, H.; Thorpe, D.; Afshar, M.; Karnik, N. S.; chhabra, n.

2025-07-18 emergency medicine 10.1101/2025.07.18.25331782 medRxiv
Top 0.1%
22.8%
Show abstract

ObjectiveEmergency department (ED) encounters represent valuable opportunities to initiate evidence-based treatments for patients with opioid misuse, but few receive such care. Universal manual screening has been proposed to improve patient identification but is uncommon due to its time and resource-intensive nature. We sought to determine the feasibility of identifying patients with opioid misuse at the time of ED triage using machine learning (ML). MethodsWe conducted a retrospective cohort study of 1,123 ED encounters (September 2020 - March 2023) at a tertiary hospital. Encounters were enriched for opioid misuse, manually annotated, and chronologically split for training, validation, and testing. Candidate triage-time features included patient demographics, Emergency Severity Index, arrival time of day, chief complaint, comorbidities, and chronic medications. Model performance was evaluated using F1 score, area under the precision-recall curve (AUPRC), accuracy, recall, and AUROC. Post-hoc explainability analyses included SHapley Additive exPlanations (SHAP) and feature importance. ResultsAll models performed comparably to opioid-related diagnosis codes placed at any time during the encounter. Random Forest (F1=0.75 [95%CI 0.70-0.83], AUPRC=0.88 [0.81-0.93], accuracy=0.79 [0.70-0.83]) and Gradient Boosting (F1=0.77 [0.71-0.82], AUPRC=0.89 [0.85-0.93], accuracy=0.81 [0.720.84]) had among the highest F1 score and AUPRC but confidence intervals overlapped with other methods. Explainability analyses highlighted prior drug-use diagnosis codes, triage acuity, and age as top predictors. ConclusionML classifiers leveraging routinely collected triage data offer a feasible alternative to manual screening in flagging opioid misuse before physician evaluation, potentially enabling early harm-reduction interventions. Prospective multi-site validation, calibration, and bias assessments are warranted.

11
Development and validation of a near-comprehensive RxNorm valueset of opioid medications

Wasz, M.; Shankar, P. R. V.; Sprouse, E.; Kirchner, L.; Garber, M.; Jones, J.; Mandl, K.; McMurry, A.

2024-11-05 health informatics 10.1101/2024.11.05.24316759 medRxiv
Top 0.1%
22.8%
Show abstract

ObjectiveDevelop a near-comprehensive opioid medications valueset for population measures of opioid related treatments and outcomes. The opioid valueset should be free, open source, and conform to the RxNorm standard federally mandated in every US-certified electronic health record. Materials and MethodsCumulus opioid valueset was manually curated by the authors and expanded using computer assisted curation. Opioid classifier rules were developed to select opioid RxNorm concepts with known opioid receptor interactions, ingredients, keywords, and drug product formulations. Twelve publicly available valuesets were used to develop and validate the Cumulus opioid valueset. Validation accuracy was measured against a corpus of opioid medication orders and non-opioid pain relievers. ResultsCumulus opioid valueset recall was >99.9% when measured against opioid prescription RxNorm codes from UC Davis Health and Brigham and Womens Hospital. Cumulus opioid valueset was 100% specific compared to three valuesets of non-opioid pain relievers. Discussion and ConclusionTo the authors knowledge, Cumulus opioid valueset is the largest publicly available valueset of opioid medications (8,926 RxNorm concepts). The intended use of this opioid valueset is for population health measures of opioid medications and related patient outcomes.

12
Machine Learning for Antibiotic Stewardship in the Treatment of Stapholycoccus Bacterial Infections

Brokowski, T. J.; Chiang, J.

2022-11-29 health informatics 10.1101/2022.11.28.22282797 medRxiv
Top 0.1%
22.1%
Show abstract

Antibiotic resistance is one of the leading issues in modern healthcare due to the inability to treat common infections with available antibiotics. Many of the mechanisms of resistance have been caused by the inappropriate prescription of antibiotics to treat illnesses such as the cold or flu or the over-prescription of broad-spectrum antibiotics. Epitomizing this problem is the Staphylococcus bacteria where certain strains have become resistant to penicillin-related drugs and Vancomycin, one of the treatments for MRSA. To address this, we developed machine learning models to predict antibiotic activity and susceptibility using a patients entire available electronic health record. We selected patients who were suspected of having a staph infection from the Medical Information Mart for Intensive Care III (MIMIC-III) data set and utilized their microbiological culture results to identify the number of patients that were prescribed an inappropriate antibiotic and then propose suitable alternatives. In our test set, we identified that empiric prescriptions had an efficiency rate of 40 percent (the rate at which an antibiotic that would provide activity was prescribed), and the other 60 percent of cases were not susceptible to the prescribed antibiotic or the antibiotic that they were given was not tested for susceptibility against their infection. Our best models identified antibiotic susceptibility with AUROCs up to 0.9 and raw specificity up to 0.7. The models were also able to propose suitable alternatives in all but 10 cases. Overall these results demonstrate the need for implementing clinical decision support systems advising clinicians during the prescription process, and our further work will address this issue.

13
Data-Driven Insights on Opioid Use and Health Behavior Trends Following Decriminalization: Zero-Shot Sentiment and Behavior Analysis

Harfi Moridani, S.; Yang, C.; Noaeen, M.; Shakeri, Z.

2025-02-12 health informatics 10.1101/2025.02.09.25321976 medRxiv
Top 0.1%
21.9%
Show abstract

Opioid decriminalization has taken on renewed urgency in regions grappling with high mortality and health-care costs. Traditional assessments often focus on legal or epidemiological data, leaving gaps in understanding how the public actually perceives and reacts to such policies. This paper introduces an AI-driven approach that applies Mistral, a Large Language Model (LLM), to a corpus of over 22,000 Reddit comments discussing British Columbias decriminalization policy. Our method uses zero-shot classification to track shifts in sentiment and self-reported behaviors related to opioid use and harm reduction. The findings suggest that online conversations initially reflected optimism about reduced stigma and broader acceptance of harm reduction measures, but sentiment became more mixed as policy details and lived experiences surfaced. This pattern indicates that advanced LLM-based text analysis can yield deep insights into the evolving public narrative on health interventions, informing future policymaking and healthcare strategies.

14
Identifying Key Predictive Features for Opioid Use Disorder Using Machine Learning

Akhter, S.; Miller, J. H.

2025-07-15 health informatics 10.1101/2025.07.12.25331446 medRxiv
Top 0.1%
21.7%
Show abstract

BackgroundOpioid Use Disorder (OUD) continues to pose a pressing public health challenge across the United States, highlighting the critical need for early and accurate risk assessment tools that facilitate prompt prevention and intervention efforts. Machine learning methods have emerged as valuable tools for parsing complex medical datasets and aiding in clinical decisions. However, their effectiveness and interpretability largely rely on the appropriateness and quality of selected input features. ObjectiveIn this work, we conducted a comprehensive comparison of three distinct feature selection strategies--Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)-- to identify the most predictive indicators of OUD. MethodsThe analysis was performed using data from the 2023 National Survey on Drug Use and Health (NSDUH), a dataset compiled by RTI International under the direction of the Substance Abuse and Mental Health Services Administration (SAMHSA). This dataset encompasses a broad spectrum of features related to demographics, behavior, mental health, and substance usage. Each feature selection method yielded a set of important predictors, which were subsequently used to train eXtreme Gradient Boosting (XGBoost) classification models. To enhance model transparency and interpretability, SHapley Additive exPlanations (SHAP) was employed to illustrate the influence of individual variables on model predictions. ResultsThe performance of the models was evaluated and compared, with the model informed by CVFE-selected features achieving the best outcomes--demonstrating a predictive accuracy of 79.11% and an area under the curve (AUC) of 0.8652. The top 10 most influential features, based on SHAP value rankings from the best-performing model, included past-year misuse of pain relievers, recent alcohol use disorder, age group, history of asthma, receipt of substance use treatment in the past year, educational attainment, household size, total household income, marital status, and race/ethnicity. The web application, accessible via https://shiny.tricities.wsu.edu/oud-prediction/, offers prediction outcomes, probability metrics, and a SHAP visualization generated from the best model built using cross-validation-based approach. ConclusionsThe findings highlight the crucial importance of effective feature selection in enhancing both model accuracy and interpretability, ultimately supporting the development of practical, data-driven approaches that may help healthcare providers assess OUD risk and tailor prevention strategies to individual needs. Trial registrationNot applicable as this research is not a clinical trial.

15
Advancing Rheumatology Practice: Systematic Review of Natural Language Processing Applications.

Omar, M.; Glicksberg, B. S.; Reuveni, H.; Nadkarni, G.; Klang, E.

2024-03-09 health informatics 10.1101/2024.03.07.24303959 medRxiv
Top 0.1%
19.8%
Show abstract

BackgroundWith the advent of large language models (LLM), such as ChatGPT, natural language processing (NLP) is revolutionizing healthcare. We systematically reviewed NLPs role in rheumatology and assessed its impact on diagnostics, disease monitoring, and treatment strategies. MethodsFollowing PRISMA guidelines, we conducted a systematic search to identify original research articles exploring NLP applications in rheumatology. This search was performed in PubMed, Embase, Web of Science, and Scopus until January 2024. ResultsOur search produced 17 studies that showcased diverse applications of NLP in rheumatology, addressing disease diagnosis, data handling, and monitoring. Notably, GPT-4 demonstrated strong performance in diagnosing and managing rheumatic diseases. Performance metrics indicated high accuracy and reliability in various tasks. However, challenges like data dependency and limited generalizability were noted. ConclusionNLP, and especially LLM, show promise in advancing rheumatology practice, enhancing diagnostic precision, data handling, and patient care. Future research should address current limitations, focusing on data integrity and model generalizability.

16
Early identification of Family Medicine residents at risk of failure using Natural Language Processing and Explainable Artificial Intelligence

Joshi, A.; Mortezaagha, P.; Inkpen, D.; Seale, E.; Archibald, D.; Noel, K.; Rahgozar, A.

2024-12-08 medical education 10.1101/2024.12.07.24318566 medRxiv
Top 0.1%
19.0%
Show abstract

BackgroundDuring residency, each resident is observed and receives feedback based on their performance. Residency training is demanding, with some residents struggling with their academic performance. A competency-based residency training programs success depends on its ability to identify residents with difficulty during their first year of post-graduate education and to provide them with timely intervention and support. ObjectiveIn large training programs such as Family Medicine, identifying residents at risk of failing their certification exams is difficult. We developed an AI system using state-of-the-art technologies in Machine Learning (ML), Deep Learning (DL), Natural Language Processing (NLP) and Explainable AI (XAI) to detect at-risk residents automatically. Materials and MethodsThe research was conducted in the 2023-24 academic year. We implemented ML, DL and NLP models for prediction and performance analysis. The target variable chosen for the prediction was the determination of whether the resident would fail or pass their certification exam. XAI was used to enhance the understanding of the models inner workings. ResultsIn total, there were 1382 data points of residents. The final model, Support Vector Machine (SVM), achieved an accuracy of 89.05% and an F1 score of 74.54 for the multiclass classification when multimodal (text and tabular) data was used. This model outperformed the models that only used qualitative or quantitative data exclusively. ConclusionCombining qualitative and quantitative data represents a novel approach and provided better classification results. This research demonstrates the feasibility of an automated AI system for the early identification of residents at risk of academic struggle. Prior Abstract PresentationAbstract presented at AMEE (An International Association for Medical Education) Conference Basel, Switzerland, August 24-28, 2024.

17
Deep learning application to automatic classification of pharmacist interventions.

Alkanj, A.; Godet, J.; Johns, E.; Gourieux, B.; Michel, B.

2022-12-05 health informatics 10.1101/2022.11.30.22282942 medRxiv
Top 0.1%
18.9%
Show abstract

BackgroundPharmacist Interventions (PIs) are actions proposed by pharmacists during the prescription review process to address non-optimal drug use. PIs must be triggered by drug-related problems (DRP) but can also be recommendations for better prescribing and administration practices. PIs are produced daily text documents and messages forwarded to prescribers. Although they could be used retrospectively to build on safeguards for preventing DRP, the reuse of the PIs data is under-exploited. ObjectiveThe objective of this work is to train a deep learning algorithm able to automatically categorize PIs to value this large amount of data. Materials and MethodsThe study was conducted at the University Hospital of Strasbourg. PIs data was collected over the year 2017. Data from the first six months of 2017 was labelled by two pharmacists, who manually assigned one of the 29 possible classes from the French Society of Clinical Pharmacy classification. A deep neural network classifier was trained to learn to automatically predict the class of PIs from the processed text data. Results27,699 labelled PIs were used to train and evaluate a classifier. The accuracy of the prediction calculated on the validation dataset was 78.0%. We predicted classes for the PIs collected in the second half of 2017. Of the 4,460 predictions checked manually, 67 required corrections. These verified data was concatenated with the original dataset to create an extended dataset to re-train the neural network. The accuracy achieved was 81.0 %, showing that the prediction process can be further improved as the amount of data increases. ConclusionsPIs classification is beneficial for assessing and improving pharmaceutical care practice. Here we report a high-performance automatic classification of PIs based on deep learning that could find a place in highlighting the clinical relevance of the drug prescription review performed daily by hospital pharmacists.

18
Development Process of a Clinical Decision Support System for Empiric Antibiotic Therapies in Sepsis Patients

Schmiegel, S.; Marchi, H.; Hege, P.; Elkenkamp, S.; Duevel, J.; Duesing, C.; Greiner, W.; Scholz, S. S.; Witzke, D.; Wehmeier, M.; Kaup, O.; Borgstedt, R.; Rehberg, S.; Cimiano, P.; Fuchs, C.

2025-05-29 health informatics 10.1101/2025.05.28.25328512 medRxiv
Top 0.1%
18.8%
Show abstract

BackgroundThe principal treatment against bacterial infections are antibiotic therapies. However, increasing antibiotic resistances pose a major threat to global health care systems by which sepsis patients are particularly affected. Those patients urgently need to be treated with the most effective antibiotic therapy to maximize their chances of survival while simultaneously preventing the development of both individual and global resistances. Consequently, in order to select a proper empiric antibiotic therapy, the treating physicians need to account for many different factors. A clinical decision support system (CDSS) aims to support physicians in deciding on a fast and targeted antibiotic therapy. ObjectiveThe purpose of this work is to explore the extent to which the realization of a CDSS is possible based on the data available to us, and to document our insights gained during the development of a foundational model designed to assist physicians in determining empiric treatment options for sepsis patients. In this regard, we aim to highlight the importance of close interprofessional collaboration between scientists from various disciplines and to analyze the effects of data quality and quantity on the performance of our statistical models. MethodsEmpirical scientists regularly conducted interviews with medical practitioners in order to acquire medical knowledge required to develop sound statistical models. We developed and applied two-step cross-sectional as well as time series classification models to carefully preprocessed data of sepsis patients admitted to the intensive care unit of a German hospital. ResultsWe identified several factors as crucial information for valid decisions on empiric therapy for treating sepsis patients. These include the patients core data, especially the infection focus. To prevent further resistances, individual risk factors such as travel history and professional background should be considered. The evaluation of a therapys effectiveness is mainly based on the patients general condition and blood values such as procalcitonin and interleukin 6. One key factor in the acceptance of CDSS is the explainability of the results produced by the applied methods. Our models come along with mainly moderate but comprehensive predictive ability for all considered empiric antibiotic therapies. ConclusionThis work highlights the importance of interprofessional collaboration between medical experts and model developers, ensuring that data quality and clinical relevance are central to the process. It emphasizes the urgent need for high-quality, comprehensive data to overcome challenges such as data discontinuity and improve model performance, particularly through enhanced digitization in healthcare. This foundational work will facilitate future efforts to develop a CDSS for treating sepsis patients and to translate it to clinical use.

19
Algorithmic Identification of Potentially High Risk Abdominal Presentations (PHRAPs) to the Emergency Department: A Clinically-Oriented Machine Learning Approach

Kuzma, R.; Saraswathula, V.; Moon, K.; Kelz, R. R.; Friedman, A. B.

2022-02-09 emergency medicine 10.1101/2022.02.08.22270691 medRxiv
Top 0.1%
18.7%
Show abstract

BackgroundOlder adults presenting to emergency departments (EDs) with abdominal pain have been shown to be at high risk of subsequent morbidity and mortality. Yet, such presentations are poorly studied in national databases. Claims databases do not record the patients symptoms at the time of presentation to the ED, but rather the diagnosis after testing and evaluation, limiting study of care and outcomes for these high risk abdominal presentations. ObjectivesWe sought to develop an algorithm to define a patient population with potentially high risk abdominal presentations (PHRAPs) using only variables commonly available in claims data. Research DesignTrain a machine learning model to predict abdominal pain chief complaints using the National Hospital Ambulatory Medical Care Survey (NHAMCS), a nationally-representative database of abstracted ED medical records. SubjectsAll patients contained in NHAMCS data from 2013-2018. 2013-2017 were used for predictive modeling and 2018 was used as a hold-out test set. MeasuresPositive predictive value and sensitivity of the predictive algorithm against a hold-out test set of NHAMCS patients the algorithm was blinded to during training. Predictions were assessed for agreement with either a chief complaint of abdominal pain (contained in "Reason for Visit 1"), or an expanded definition intended to capture visits which were for abdominal concerns. These included secondary or tertiary complaints of abdominal pain or other abdominal conditions, other abdominal-related chief complaint (e.g. nausea or diarrhea, but not pain), discharge diagnosis of an abdominal condition, or reception of an abdominal CT or ultrasound. ResultsAfter validation on a hold-out data set, a gradient boosting machine (GBM) was the best best-performing machine learning model, but a logistic regression model had similar performance and may be more explainable and useful to future researchers. The GBM predicted a chief complaint of abdominal pain with a positive predictive value of 0.60 (95% CI of 0.56, 0.64) and a sensitivity of 0.29 (95% CI of (0.27, 0.32). Nearly all false positives still exhibited signs of "abdominal concerns" for patients: using the expanded definition of "abdominal concern" the model had a PPV of >0.99 (95% CI of 0.99, 1.00) and sensitivity of 0.12 (95% CI of 0.11, 0.13). ConclusionThe algorithm we report defines a patient population with abdominal concerns for further study of treatment and outcomes to inform the development of clinical pathways.

20
The Crucial Role of Predictive Models in Childhood Asthma care: Improving Outcomes Through Data-Driven Insights

CHAKRABORTY, A.; Bashar, A. R.

2025-07-23 health informatics 10.1101/2025.07.23.25332082 medRxiv
Top 0.1%
18.7%
Show abstract

BackgroundAsthma is one of the most prominent chronic diseases in children and one of the most challenging ailments to diagnose in infants and preschoolers in the United States. Predictive models can be instrumental in offering a data-driven approach to improve early diagnosis, personalize treatment strategies, and disease progression. By utilizing nationalized data, this study focuses on building and comparing high-performing analytical predictive models based on the 28 associated risk factors and identifying the most contributing factors influencing childhood asthma. MethodData came from the BRFSS (2011-2020) Asthma Call Back Survey (ACBS). The cross-sectional study included 9813 participants with a response rate of 65% (current asthma status positive). Respondents were randomly divided into training and testing samples. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical eXtreme Gradient Boosting (XGBoost) model. The fitted XGBoost model was compared with four competing ML models, including support vector machine (SVM), random forest, LASSO regression, and GBM. The performance of all the models was compared using accuracy, AUC, precision, and recall. Variable importance plot (VIP) was used to measure the percentage of contribution of the predictors to the response, and Shapley Additive exPlanations (SHAP) plot was used to understand how the predictors are related to the outcome. Chi-square test was used to measure the association between the predictors and the outcome. ResultsAsthma diagnosis was found to vary by age group, with the highest prevalence in kindergarten age (31.44%). Of the five predictive models, the XGBoost was found to be the best performing model with AUC: 0.95, followed by random forest (AUC: 0.9345), GBM (AUC: 0.9341), SVM (AUC 0.9304), and LASSO (AUC 0.88); however, the random forest model was found to have the highest sensitivity (0.9786), and hence preferred for initial screening of asthma. The top two contributing predictors were overnight hospitalization visits and time since the last asthma medication, accounting for 24.62% and 20.92%, respectively, to the asthma status, from the VIP. ConclusionThe analytical methodology of the model development was found to be instrumental in the discovery of behavior health-risk knowledge and to visualize the significance of predictive modeling from a multidimensional behavioral health survey. These insights can be instrumental in predicting different types of chronic lung diseases affecting people of all ages and can be useful for clinicians to diagnose asthma at an early stage, allowing for early intervention and proactive management.